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Abstract. We consider a study of players employed by teams who are members 
of the National Basketball Association where units of observation are functional 
curves that are realizations of production measurements taken through the course 
of one’s career. The observed functional output displays large amounts of between 
player heterogeneity in the sense that some individuals produce curves that are 
fairly smooth while others are (much) more erratic. We argue that this variability 
in curve shape is a feature that can be exploited to guide decision making, learn 
about processes under study and improve prediction. In this paper we develop 
a methodology that takes advantage of this feature when clustering functional 
curves. Individual curves are flexibly modeled using Bayesian penalized B-splines 
while a hierarchical structure allows the clustering to be guided by the smoothness 
of individual curves. In a sense, the hierarchical structure balances the desire to 
fit individual curves well while still producing meaningful clusters that are used 
to guide prediction. We seamlessly incorporate available covariate information to 
guide the clustering of curves non-parametrically through the use of a product 
partition model prior for a random partition of individuals. Clustering based on 
curve smoothness and subject-specific covariate information is particularly im¬ 
portant in carrying out the two types of predictions that are of interest, those 
that complete a partially observed curve from an active player, and those that 
predict the entire career curve for a player yet to play in the National Basketball 
Association. 

Keywords: Product partition models, Nonparametric Bayes, Penalized splines, 
Hierarchical models, Right censored data, NBA player production curves. 


1 Introduction 

In multi-subject studies where observations are considered to be functional realizations, 
it is common to observe large amounts of between-subject heterogeneity in the sense 
that some subjects produce curves that are quite smooth while others produce curves 
that are (much) more erratic. Although not often explicitly considered when modeling 
these types of data, the variability in curve shape can be an important feature that 
may help distinguish individuals and lead to better understanding of processes under 
study and/or better predictions. Often in these studies two types of predictions are 
desired: those that predict the functional output across the entire time domain for 
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a hypothetical new subject and those that complete a partially observed functional 
response for subjects currently participating in the study. Explicitly considering curve 
shape in modeling, in addition to considering relevant covariates, should improve both 
types of predictions. This is particularly true of the application motivating the present 
study. Decision makers of teams that belong to the National Basketball Association 
(NBA) are very interested in being able to group and predict future performance of 
basketball players that are employed or could possibly be employed by NBA teams. 

Hay Alton Jarron Collin* Erick Danptor 
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Figure 1: Career game-by-game Game Score results for three NBA players. A loess 
smoother with span equal to 0.3 is also provided. 

The NBA is a North American professional mens basketball league that arguably 
employs the worlds most gifted basketball players. The league’s popularity has steadily 
increased and as a result player salaries have exploded. Personnel decisions in the NBA 
(as in most professional sports leagues) are high-risk transactions. In the face of mas¬ 
sive amounts uncertainty teams offer players guaranteed multi-year multi-million dollar 
contracts and as a result mistakes in player acquisition are extremely expensive. Mak¬ 
ing things even more treacherous are the abstruse rules governing player transactions 
found in the collective bargaining agreement (CBA). Among other things, the CBA reg¬ 
ulates the amount of resources dedicated to player acquisition. Teams that misallocate 
player salary resources by over paying severely hinder a team’s future flexibility and 
negatively impact a team’s future competitiveness and profitability for years. Because 
of this, added value might be assigned to players who perform consistently compared 
to those that are more up and down. 

Figure 1 displays scatter-plots and loess curves (with a span of 0.3) of game-by-game 
“production” for three NBA basketball players. Game-by-game “production” in Figure 
1 is measured using the so called Game Score statistic (Hollinger 2002). More details 
are provided in Section 2 and the Appendix, but for now it suffices to know that higher 
values correspond with better performance and more production. Even though there 
is a large amount of game-to-game and player-to-player variability in Game Score it is 
still evident that production consistency between the three players varies. Erick Dampier 
appears to have two spikes of improved production, Jarron Collin’s production oscillates 
during the first part of his career while Ray Allen’s production is fairly smooth as it 


G. L. Page and F. A. Quintana 


381 


gradually increases and decreases with slight dip in the middle. Therefore curve shape 
should contain information that is valuable in distinguishing between different types of 
players and being able to assess their future value. Two types of predictions are used 
to assess future value. The first considers players who are currently members of the 
NBA and that will continue to participate in future games. Ray Allen in Figure 1 is an 
example of such a player. Although he has already played in more than 1000 games, 
he continues to play and predicting the performance for the remainder of his career 
is of considerable interest. This type of prediction will be referred to as “active player 
prediction”. The second type of prediction considers basketball players who have yet 
to play in an NBA game but who have a skill set that will attract interest from NBA 
teams. For these players, predicting the entire career production curve is of interest. 
This type of prediction will be referred to as “career prediction”. 

It has become fairly common to consider the longitudinal curves of the type just 
described as discretized realizations of functional data. There is now a large literature 
dedicated to functional data analysis (FDA) techniques. A few popular methods that are 
actively being researched are functional principal components (Ramsay and Silverman 
2005, chap. 6, Di et al. 2009), Gaussian process regression methods (Rasmussen and 
Williams 2006, Behseta et al. 2005, and Zhu and Dunson 2013) and multi-level functional 
basis expansion (Morris and Carroll 2006, DiMatteo et al. 2001, Biller 2000, Montagna 
et al. 2012, Bigelow and Dunson 2007). When considering multiple-subject studies the 
methods just described tend to separate individuals according to trend levels only, while 
ignoring the shape of the longitudinal projections. Though the idea of explicitly using 
shape or smoothness of curves to improve prediction is intuitively appealing there is 
surprisingly very little in the statistical literature dedicated to it. The one article that 
we are aware of is Zhu and Dunson (2012) whose focus is on estimating rate functions 
through a complicated system of differential equations and using covariates to explain 
variability in trajectories via stochastic volatility models. They applied their method to 
a longitudinal multi-subject blood pressure study for pregnant women and noted that 
blood pressure trajectories for normal women were more smooth relative to women with 
preeclampsia. We however, take an entirely different approach. Instead of dealing with 
a complicated system of differential equations we incorporate curve shape in modeling 
through a penalty parameter analogous to that found in penalized splines. 

Our model involves an implied distribution on partitions of players. The allocation 
variables are treated as parameters and thus our approach may be seen as an extension 
of latent class analysis (LCA) (Collins and Lanza 2010) which classifies individual player 
curves into K pre-specificed clusters (see Dean and Raftery 2010 and Elliott et al. 2005). 
Unlike LCA, our methodology does not require a fixed pre-specified number of clusters 
as this is inferred from the corresponding posterior distribution on partitions. We briefly 
note that there does exist a small literature dedicated to estimating certain aspects of 
functional output such as dynamics (the speed of price increases and the rate at which 
this speed changes) that depend on covariate information (see Wang et al. 2008 and Zhu 
et al. 2011 and references therein). But these are not relevant to the current setting as 
they fail to deal with multiple-subject studies nor do they use curve shape in prediction 
and inference. 
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In sports, Berry et al. (1999) model career trajectories (or aging curves) non para¬ 
metrically in order to make historical comparisons of player’s abilities in baseball, hockey 
and golf. Connolly and Rendleman Jr. (2008) consider career paths of golf players to 
determine combinations of luck and skill required to win a golf tournament. Neither of 
these works were interested in grouping players to carry out career and active player 
predictions. 

As noted, our principal goal is making active player and career predictions. If pre¬ 
dictions are computed using methods that treat individual players independently, then 
both types of predictions would be extremely poor as they would not be data driven. 
One way of improving predictions is by borrowing strength (or sharing information) 
among players whose career production curves might be deemed similar. A straight¬ 
forward way of borrowing strength is by introducing player clusters. However, if all 
individuals of the same cluster are restricted to have the same curve, then some indi¬ 
viduals will invariably be poorly fit (too much borrowing of strength). Alternatively, if 
curves of all individuals of the same cluster are completely unrestricted, then clustering 
players would provide no predictive information (too little borrowing of strength). The 
methodology proposed in this article is able to balance very well the desire to produce 
good fitting individual curves while still producing clusters that allow enough borrowing 
of strength among similar players to guide prediction. This is carried out by employing 
a hierarchical model where subject-specific functions are modeled flexibly through a 
linear combination of basis functions whose coefficients are drawn from cluster-specific 
process level distributions. Doing this produces flexible subject-specific curves while still 
being able to produce reasonably accurate predictions by pooling together players with 
similar features/performances. 

In our model, having covariate dependent clusters is crucial to carrying out career 
prediction as these are produced using the predictive distribution available from the 
covariate dependent clustering mechanism. Also, shape dependent clusters are useful 
to carrying out active player predictions as incomplete active player curves are filled 
in using curves of retired players that have similar career trajectories. There has been 
work regarding completing curves (Goldberg et al. 2014 work out Best Linear Unbiased 
Predictors (BLUPS) for past and future curve segments) and local borrowing of strength 
to fit global curves (Petrone et al. 2009 employ functional Dirichlet processes to group 
Gaussian process realizations), but the approaches developed and purposes are very 
much different from the present study. 

The remainder of the article is organized as follows. Section 2 describes the data 
collected and employed in the analysis. Section 3 provides details regarding the develop¬ 
ment of the methodology highlighting model components associated with cluster-specific 
curve smoothness and active player prediction. Section 4 provides details regarding com¬ 
putation of posterior and predictive distributions. In Section 5 we provide details of a 
small simulation study. Results from the analysis of the NBA application are provided 
in Section 6. Finally, we provide some concluding remarks in Section 7. 
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2 Description of Data 

We collected common game-by-game (including playoffs) modern basketball metrics 
for each player drafted into the NBA during the years 1992-2004 (Shaquille O’Neal 
to Dwight Howard) that participated in at least one game up through the 2009/2010 
season. This resulted in 576 players with number of games played ranging from 2 to 
1383 games. A few of the players appeared in very few games and are not representative 
of a typical NBA player. Because of this, and to reduce the noise introduced by the 
careers of players that contain little information regarding the processes of interest, we 
restrict our attention to players with at least three seasons of experience (the rookie 
contract length of the 2005 CBA). Also, to retain enough games to get a reasonable 
sense of a player’s ability we only include players who played at least a half a season’s 
worth of games (42). Finally, we excluded the following 8 players whose careers were cut 
short either by career ending injuries or untimely deaths: Bryant Reeves, Malik Sealy, 
Eddy Curry, Jason Collier, Eddie Griffin, Yao Ming, T. J. Ford, and Gilbert Arenas. 
This resulted in 408 players with number of games played through the 2009/2010 season 
ranging from 45 to 1383. Of the 408 players, 263 are classified as “retired” as they did 
not play beyond the 2009/2010 season. 

Measuring game-by-game production is not straightforward as there are numerous, 
difficult to measure factors that influence player performance. Because of this no gold 
standard basketball production metric exists. That said, one that has become somewhat 
popular is John Hollinger’s so called Game Score which is a linear combination of 
common variables that are recorded for each player through out the course of a game 
(e.g., number of baskets made and number of steals acquired. More details can be found 
in the Appendix and at Hollinger 2002). This metric will Ire used as our response variable 
and therefore a representation of a player’s game productivity. Though Game Score has 
deficiencies (e.g., weighted heavily towards offensive output and doesn’t account for 
quality of opponent), it provides a fairly accurate indicator of player production for 
any given game. The maximum Game Score collected is 63.5 (corresponding to Kobe 
Bryant’s 81 point game). The minimum Game Score was -9.9 and the average Game 
Score among all players is 8.1. An alternative to raw Game Score is a standardized Game 
Score where standardization is carried out by dividing Game Score by the minutes played 
in each game, thus removing Games Score’s dependence on minutes played. However, 
players whose production is not negatively impacted by increased minutes are more 
valuable than those who are less efficient with increased game time and distinguishing 
between these types of players is desirable. For this reason we opt to use raw Game 
Score values. 

For an aging (or time) variable there are various units that could be used. For 
example, age, number of accumulated minutes played, or simply the number of games 
played are all reasonable. Since each of these measurements are only available to us 
on a game-by-game basis, the shape (or smoothness) of the curve remains unchanged 
regardless of age unit employed. Thus for sake of expositional clarity we use number of 
games played (see Figure 1 as an example). 

Through exploratory analysis we identified three covariates that in addition to being 
of interest in their own right are informative in grouping players. These are age during 
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first game played (measured in years), experience before being drafted into the NBA 
(High School basketball only, some Collegiate basketball, or International basketball), 
and draft order. Draft order is the order in which players are selected in their respective 
drafts. For example, a player with draft order 1 implies he was the first player selected 
in his respective draft, and draft order 2 implies he was the second player selected etc. 
In Section 3 we describe how draft order is used explicitly to predict total games played 
for active players, but as a covariate used to influence clustering we categorize a player’s 
draft order as being a top five pick, a first round pick (excluding the first five) and a 
second round pick. (Since 1989 the NBA draft has consisted of two rounds.) Table 1 
provides the number of players in each of the nine categories. Other baseline covariates 
were considered such as position, height, and other physiological characteristics but 
preliminary research indicated they were not useful in partitioning players with regards 
to production. 

Table 1: Total number of players in each of the nine categories. 

Experience 

Draft High School College International 


Top 5 

7 

51 

4 

1st Round 

15 

200 

25 

2nd Round 

1 

98 

8 


3 Model Description and Development 

We first consider the model’s clustering mechanism highlighting its dependence on 
subject-specific covariates. Secondly, the likelihood structure incorporating the num¬ 
ber of games played and career length (which are right censored for active players) is 
detailed. Lastly, we describe the hierarchical component which balances goodness of 
individual fit with ability to produce clusters that are able to guide prediction. 

3.1 Product Partition Model with Covariates (PPMx) 

Let i = 1,..., m index the m players in the study. Further, let p = {Si,..., Sk m } 
denote a partitioning (or clustering) of the m individuals into k m subsets such that 
i £ Sj implies that individual i belongs to cluster j. Alternatively, we will denote 
cluster membership using Si,..., s m where S{ = j implies i £ Sj. Let Xi = [xn , Xa , Xa) 
denote player z’s covariate vector with xn corresponding to age, Xa experience and xa 
draft order. Let x* = {xi : i £ Sj} be the partitioned covariate vector. Our approach 
is to first directly model p with the covariate dependent product partition model of 
Muller et al. (2011) (which will be referred to as the PPMx model) and then construct 
a hierarchical model given the partition (as opposed to introducing latent variables 
that indirectly induce a partitioning of individuals). The PPMx prior incorporates the 
idea that individuals with similar covariate values are more likely a priori to belong to 
the same cluster relative to individuals with dissimilar covariate values. Additionally, 
this prior is very simple, highly customizable, seamlessly incorporates different types of 
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covariates (e.g., continuous or categorical), and is particularly well suited for prediction 
(something that is of interest here). An alternative method not considered here can be 
found in Park and Dunson (2010). The PPMx prior consists of a cohesion function, 
c(Sj) > 0 for Sj C {1,... ,n}, and a nonnegative similarity function g(x*). The former 
measures the tightness of how likely elements of Sj are clustered a priori and the latter 
formalizes the similarity of the Xi s by producing larger values of g{x*) for x^s that are 
more similar. The form of the PPMx prior is simply the following product (for more 
details see Muller et al. 2011) 


fc m 

P(p\x) cc]Jc(Sj)g{x*). (1) 

. 7=1 

A simple example of a cohesion function that produces a Dirichlet Process type 
partitioning is c(Sj ) = M x (|Sj| — 1)! for some positive M and | • | denoting cardinality. 
Regarding possible similarity functions, Muller et al. (2011) provide a bit of exposition 
for different types of covariates (e.g., continuous, ordinal, or categorical). Generically 
speaking they suggest the following structure 

g( x j) = f II '/ ( Ti Cyl'/KjX/C,. ( 2 ) 

where Q is a latent variable and q(-\Cj) and q(-) are (typically) conjugate probability 
models. This structure is not necessarily used for its probabilistic properties (indeed x 
is not even random), but rather as a means to measure the similarity of the covariates in 
cluster Sj. In reality any function that produces larger values as the entries of x* become 
more similar can be considered as a similarity function. For example g{x*) = exp{—s|} 
where Sj is the empirical variance of x * is a completely reasonable similarity function 
for continuous covariates. 

It turns out that the similarity function (2) coupled with cohesion function c(Sj) = 
M x (| Sj | — 1)! produces the same marginal prior distribution on partitions as that 
induced by using a Dirichlet process (DP). For more details see Miillcr et al. (2011). 

Given p we may proceed to specify a hierarchical model that flexibly models individ¬ 
ual curves. Before doing so, we very briefly introduce a few pieces of notation that will 
be used. In what follows cluster-specific and subject-specific parameters will need to be 
distinguished. If we let 0, denote some generic subject-specific parameter vector, then 
0* will be used to denote a cluster-specific parameter in the sense that i € Sj implies 
that Qi = 0*. Alternatively, cluster labels (si,..., s m ) can be used to connect subject 
and cluster specific parameters through 0, = 0 *.. Lastly, vectors of subject-specific and 
cluster-specific parameters are denoted by 6 = (#i,..., 6 m ) and 9 * = ( 0 *, ..., 9 * k ). 

3.2 Likelihood 

To distinguish between players that play beyond the 2009/2010 season we use the fol¬ 
lowing indicator variable 
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JO if player i retired before or at the conclusion of the 2009/2010 season 
J 1 if player i played beyond the 2009/2010 season. 

Let rii denote the total number of games played in player *’s career. If (ji = 0 then 
rii is observed otherwise a lower bound denoted by hi is observed such that rii > hi. 
Thus, we are dealing with right censored type observations and we will incorporate 
ideas developed for modeling them. We denote the response vector for players whose 
gi = 0 with yi = (yn,... ,y ini ) otherwise y t = (yn ■ ■ ■, Vifn) ■ The career production 
curve for players whose gi = 1 needs to be “completed” which requires the prediction 
or imputation of rii. 

Predicting rii is not trivial (even given hi) because it is highly variable and demon¬ 
strates a strong association with very few covariates. One covariate we found that dis¬ 
plays a strong association with rii is career length (denoted by Li and measured in 
years). Unfortunately, this variable is also right censored and for gi = 1 we only observe 
Li. However, we consider Li because it displayed a stronger association with the uncen¬ 
sored variable draft order (denoted by di) than that found between rii and di. Therefore 
we employ di to first impute Li and then use Li to predict rii. Thus, the likelihood for the 
ith player is composed of the random variables (y 2 :, c/i,ni :gi= o,ni :gi =i,Li: gi =o,Li: gi =i) 
which we model jointly by way of 

P(yiigii'n i i'.gi=0i1 r li:gi — liLi : g i= 0)Li : g i= l) = \p{lji\ni, Li)p(jli\Li)p(yLi)) 9 

x [p(yi\ni, Li)p(hi\Li)p(Li)] 9i . 

We now detail each of the three likelihood components. 

When only considering retired players we found that the association between di and 
Li was somewhat nonlinear. (This is reasonable considering that our pool of players 
consists of only those who played at least one NBA game thus retaining only the “good” 
2nd round picks.) Because of this, we assume Li ~ N(vi,i/j 2 ) where 17 = 70 + 71 ^+ 72 ^?- 
However, the association between rii and Li was fairly linear so we assume rii\Li ~ 
N(r)i,5 2 ) where rji = a 0 + «i L t . Thus, (n i:gi=0 ,n i:g ,. = i,Lj :gi =o,Lj: Si =i)’s contribution 
to the likelihood is 

[p{rii\Li)p{Li)] l ~ 9 i \p(hi\Li)p{Li)] 9i = [N(m; yi, S 2 )N(Li; v u i/j 2 )\ 1 9 ' 



where iV(-;m, s 2 ) denotes a Gaussian density function with mean m and variance s 2 
and $(•) denotes a standard normal cdf. As a result, imputing n, for active players is 
carried out by first imputing Li using a quadratic model with di. 

We briefly note that although a Poisson model for rij might seem natural, it is not 
appropriate in the current context as the simultaneous increasing of the mean and vari¬ 
ance of the Poisson distribution seems to contradict what is empirically observed. Thus, 
for simplicity, we elected to employ a Gaussian to model n* and round the predictions 
(something that is not uncommon, see page 458 of Gelman et al. 2013). Also, modeling 
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m non-parametrically could potentially improve prediction but we elected to employ the 
simpler parametric model as its predictions were satisfactory for our purposes. Nonethe¬ 
less, predicting rii is of considerable interest in its own right to NBA decision makers 
and could be an interesting future research project. 

Finally, given n* and letting denote the ith player’s underlying production 

curve value for the tth game played (denoted by Zu), we model measurements yu as 

2 lit = Poi + fi{z it ) +e it for t = 1,...,n* (hi for g 1 = 1) (3) 

where eu ~ iV(0, of) independently across i. It is possible that incorporating a more 
sophisticated error model (such as autoregressive errors) could prove to be beneficial, but 
for simplicity we maintain independence. A fairly popular method of characterizing _/)(•) 
is to define a collection of basis functions (e.g., wavelet, polynomial) and assume that yu 
lies in their span. We adopt this method and employ a B-spline basis as it has a number 
of attractive computational properties and facilitates active player prediction as will be 
detailed shortly. Therefore, _/)(•) can be written as the following linear combination 

Pi 

fi(zu) = Puhe(zi t ;£i) ( 4 ) 

t=i 

where h((z',^i) denotes the £th B-spline basis function evaluated at knots contained 
in If pi denotes the number of inner knots and q the spline degree, then Pi = 
Pi + q + 1- Now define Hi as the rii x Pi matrix with rows {hi(zu), ..., hp i (zu)} for 
t = 1,...,rij (hi for gi = 1), and /3, = {fin, ..., Pip t }- Combining (3) and (4) produces 

Vi = poiU + Hifii + ei for e» ~ AT(0, crfl ni ), (5) 

where 1 j denotes a vector of ones and I ni an identity matrix. 

The dimension of Hi depends on rii (and h, for active players). This coupled with 
the fact that B-splines form a local basis in that each basis function is non-negative 
only on an interval formed by q + 2 adjacent knots can be exploited to carry out active 
player prediction. Since for any fixed zu at most q+1 basis functions are positive, the 
predicted value of rii for active players will determine the number of zero columns in 
Hi. Thus, the section of an active players curve corresponding to the zu values between 
rii and hi are completely informed by the cluster specific curve or in the case that the 
player belongs to a singleton, the grand mean curve (more details are in Section 3.3). 
Using Hi to denote the design matrix that incorporates the predicted value of rii based 
on hi, the full likelihood for © = (/3, /3o, er 2 ,r], u, ip 2 , S 2 ) is 

t(Vi, ■ ■ ■ ,Vm,n, L,n, L,g\@) 

n 

= n [ N ni(yi',PoiU + HiPi^ilnPNim^rii^^N^i-Ui^ 2 )] 1 :Jl X 
2=1 

Nn t (y,:,0Oih. + HiPi, of In, ) 11 - $ } j 1 $ j | 


X 


( 6 ) 
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3.3 Hierarchical Model 

The number and location of the inner-knots that make up are rarely known. Their 
selection is crucial to producing an attractive curve without over-fitting. So called free- 
knot splines is a very flexible method that treats as an unknown and has proved 
to be quite parsimonious in knot selection. (DiMatteo et al. 2001 and Denison et al. 
2002 provide a nice overview.) Therefore, a possible direction to incorporating shape 
variability in prediction as desired would be to base clustering on the number and 
location of knots. However, to fit a free-knot spline some type of transdimensional 
Markov Chain Monte Carlo (MCMC) algorithm is often employed and this coupled 
with the PPMx prior for p would result in a doubly transdimensional MCMC algorithm 
that would become prohibitively expensive. To avoid these computational issues and 
to make the methodology more readily accessible, for each subject we instead select 
a moderate number of equally spaced knots within the knot domain and employ the 
Bayesian P-spline technology of Lang and Brezger (2004). Now shape variability can 
influence clustering through the penalty parameter of the P-splines. However, to retain 
flexible subject-specific fits, we use P-splines as a prior distribution of process level 
parameters and allow subject-specific coefficients to vary around a cluster-specific mean. 
That is, we assume the following process level structure for the /3’s: 

~ N(0*. , Ag*J) with ~ UN(0,A), (7) 

and use a Bayesian P-spline prior for the 9 *'s (with UN( •, •) denoting a Uniform dis¬ 
tribution) . A particularly nice feature of the methodology is the explicit ability to con¬ 
trol the similarity between individual curves and their group counterparts through the 
hyper-parameter A. 

In order to highlight two departures from the Bayesian P-splines of Lang and Brezger 
(2004) required by the present modeling we very briefly introduce them here. For more 
details see Lang and Brezger (2004) and Fahrmeir and Kneib (2005). Bayesian P-splines 
are the Bayesian analogue to splines penalized by ci-order differences and are constructed 
around d-order Gaussian random walks. For example, for d = 1 

9ji = 0j,e-i + Uj e. £ = 2 ,...,n (8) 

with Uji ~ N( 0, t?*). Typically p(d*j) oc v, but an improper prior is not an appropriate 
probability model for the Polya urn representation used in the PPMx. Thus, similar 
to what was done in Telesca and Inoue (2008) we assume ~ N(0,Tj*/v 2 ) (with 
analogous extensions for d > 1). The value v can be assigned a prior distribution or 
be set to a fixed value. Equation (8) together with the 9* x ~ 7V(0,tJ* / v 2 ) produce 
9* ~ N{0, rfK~ x ) where K is a banded penalty matrix with v incorporated, rj* is 
the smoothing parameter associated with Bayesian P-splines and is crucial in being able 
to distinguish between individuals based on the smoothness of their respective curves. 
As suggested by Lang and Brezger (2004) we adopt rj* ~ IG(a T ,b T ) where 
denotes an inverse Gamma distribution and a T and b T are user supplied. 

Recall that active player prediction is carried out by borrowing strength among 
players in a cluster. If player i belongs to a singleton or all members of his cluster 
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are active players, then at least part of his prediction is completely guided by the 
prior on 9*. Since the prior is centered at 0 this would produce poor active player 
predictions. To improve prediction in these situations, we introduce fi as a vector of 
global curve coefficients such that 9* ~ N(fi,T 2 * K~ 1 ) with fi ~ iV(0, sj)J). Includ¬ 
ing fi potentially influences the values of 9* in that smaller magnitudes achieve the 
same amount of smoothing as when fa = 0. This should be taken into account when 
selecting values for a T and b T . Also, apart from improving prediction, fi is of interest 
in its own right as it provides information regarding an average career curve among all 
players. 

We end the description of our Bayesian P-spline approach with details regarding 
knot selection. A complicating factor of knot selection in modeling these data is the 
massive misalignment associated with the number of games played for each of the play¬ 
ers. Making things worse is the inherent discontinuities in games played through out 
the course of one’s career (e.g., the offseason, injuries, etc.) that we are not consider¬ 
ing. There is a “curve registration” literature dedicated to better aligning functional 
domains in multi-subject studies (Telesca and Inoue 2008). However, we align career 
paths by matching the percentile number of career games played. This is carried out 
by transforming “time” to the unit interval which greatly simplifies the process of se¬ 
lecting £*. Therefore z* t = Zu/rii is used instead of Za- (For retired players rii is the 
observed number of games played and the predicted for active players.) Thus for re¬ 
tired players z* n . = 1 while for active players z* n . < 1. Now & can be a knot set that 
partitions the unit interval into equal subintervals and since it does not depend on rii 
it can be the same for all players. We do note that aligning career paths in this way 
is imperfect as the 95th percentile of games played for one player might be during his 
third season while for another player during his fifteenth season. Even so, we believe 
that matching curves by way of percentile of games played produces coherent compar¬ 
isons and valid borrowing of strength. With an enriched data set we could attempt to 
take into account possible discontinuities in career paths. (This would actually be very 
interesting as many players improve during the off season.) It would be fairly straight¬ 
forward to expand the model in a variety of ways, but the base model proposed would 
continue being the work horse even as other more idiosyncratic aspects of the data are 
considered. 

With regards to modeling pa and of there are any number of ways one might 
proceed. It seems plausible that of might depend on zu- That said, for sake of simplicity 
we utilize the common prior structure for variance components af ~ IG(a a ,b a ) with 
a a and b a being user supplied. For the subject-specific random intercepts, we use a 
Gaussian-inverse-Gamma hierarchy such that pa ~ N(fib 0 ,cr^ 0 ) with fj,b 0 ~ N( 0, ) 
and a 2 ~ IG(a,b 0 ,bb 0 ). Finally, typical conjugate priors are used for a = (ao,ai) ~ 
N(m a ,s 2 a I), 7 = ( 70 , 71 : 72 ) ~ N(mj,s 2 I), 5 2 ~ IG(a s ,b s ), and ip 2 ~ IG(a^,b^). 

Equation (9) is provided to aid in visualizing how all the moving parts of the hier¬ 
archical model are connected. Through out the remainder of the paper we will refer to 
the entire hierarchical model as HPPMx. 
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x Njii (t/i 5 “I - Hif3i^(T i Ifi i ) 



Qi |®<T5 ba ^ IG(a a , b a ) 


PoiW 0 ,crl 0 ~ N(n bo ,al 0 ) with p bo ~ 1V(0, s^ Q ) and of Q ~ IG(a bo ,b bo ) 
a~N(m a ,s 2 I) and 5 2 ~ IG{as,b b ) 

7 ~ N{m 1 , s 2 I) and 4> 2 ~ IG{a^,, b^) 


(9) 



H ~ N{0,sll) 


Pr{p) oc c(Sj)g(Xj), 


for i = 1 ,,m and h = 1 ,..., k m . 

Before proceeding we make a brief comment regarding some specific model com¬ 
ponents. Since the Bayesian P-splines are used at the prior level rather than the pro¬ 
cess level of the hierarchical model, individual curves are not directly influenced by its 
smoothing penalization. The wiggliness of individual curves is a function of both t 2 * 
and A. As A increases the influence that r 2 * has on individual curves decreases. This is 
investigated further in the simulation study of Section 5. Therefore, if smooth individual 
curves are desired together with large within group variability it may be necessary to 
use 10-15 knots instead of the 20-30 knots recommended by Lang and Brezger (2004). 

4 Posterior Computation 

4.1 MCMC Implementation 

We fit the proposed model to data by simulating a Markov chain whose equilibrium 
distribution is the desired posterior distribution. The algorithm employed is similar to 
Neal (2000) ’s algorithm number 8 in that it can be divided into two basic pieces. The 
first updates the partition p via the Polya urn scheme of Blackwell and MacQueen (1973) 
(and further developed by Quintana 2006) and the other updates the hierarchical model 
parameters using a Gibbs sampler (Geman and Geman 1984 and Gelfand and Smith 
1990) and Metropolis steps (Metropolis et al. 1953). 

To update the cluster membership for subject i, cluster weights are created by com¬ 
paring the unnormalized posterior for the hth cluster when subject i is excluded to that 
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when subject i is included. In addition to weights for existing clusters, algorithm 8 of 
Neal (2000) requires calculating weights for p empty clusters whose cluster specific pa¬ 
rameters are auxiliary variables generated from the prior. To make this more concrete, 
let Sr 1 denote the hth cluster and the number of clusters when subject i is not 
considered. Similarly x^~ l will denote the vector of covariates corresponding to cluster 
h when subject i has been removed. Then the multinomial weights associated with the 
existing clusters and the p empty clusters are 


Pr(si 


h |—) oc 


N{Pi;0* h ,\ 2 h *I) 
-V(/3i, 0 new ji 1 A 


c (s h 2 u{i})g( x h MM) 


for h = 1 ,..., 

- 1 for h = + !)■•■) k-* + p. 


Values for 0 new ^ A^ euJ h are auxiliary variables drawn from their respective priors as 
required by algorithm 8 . Care must be taken when subject i belongs to a singleton 
cluster as removing the ith subject produces an empty cluster. This in turn requires 
relabeling the existing cluster specific components to avoid gaps in the cluster labeling. 


The full conditional distributions of (/3 t , /3 q , of, 0*. rf*) are fairly common deriva¬ 
tions and are provided in the Appendix. To update (A*, 7 , a, <5 , ip 2 ) we employed a 
random walk Metropolis step with a Gaussian proposal distribution. A Markov chain 
can be constructed by employing a Gibbs sampler that first updates p and then on an 
individual basis updates model parameters by cycling through each full conditional and 
using a Metropolis step for the non conjugate parameters. 


4.2 Posterior Prediction Distributions 


A particularly nice feature of using PPMx is the availability of career prediction through 
covariate dependent predictive distributions. Using PPMx, posterior predictive distri¬ 
butions are readily available and can be obtained online in the sense that draws from 
this distribution can be collected within the MCMC algorithm. The posterior predic¬ 
tive distributions depend on covariate values through the allocation of a new individual 
to one of the k m existing clusters or to a new cluster using the following multinomial 
weights 


Pr(s n+1 = h |-) oc 


c(S h U{B+l})s(a;^U{j!„ + i}) r h = 1 

c(S h )g(x* h ) ’' " 

c({n + l})ff({a: n +i}) for h = k m + 1 . 


( 10 ) 


Once the future player has been allocated to a cluster, one carries out the typical Monte 
Carlo integration to sample from the posterior predictive. 


4.3 Predicting rii 

Predicted values of (m, Li) for retired players are produced at each MCMC iteration. We 
essentially employ the multiple imputation ideas of Little and Rubin (1987) but with the 
exception that we are very much interested in the values being imputed. Predictions 
are fairly straight forward as they only depend on the full conditionals of rn and Li 
which turn out to be truncated normal with hi and Li acting as lower bounds (the full 
conditionals are provided in the Appendix). 
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5 Simulation Studies 

We conduct two small simulation studies to investigate the behavior of the HPPMx 
model (9). Recall that the principal motivation in incorporating a hierarchy is to bal¬ 
ance goodness of individual curve fits with the production of meaningful clusters which 
facilitate prediction. Therefore apart from showing improved prediction performance, 
the simulation study explores just how much goodness of individual fit is sacrificed in 
the name of prediction (which is very little as will be seen). The first simulation study 
demonstrates the method’s superior predictive performance by comparing out of sam¬ 
ple mean integrated prediction rates to that of two competitors (which are detailed 
shortly). The second explores how certain model components influence subject-specific 
fits, curve smoothness, and clustering. The two competitors selected represent the ex¬ 
tremes HPPMx attempts to balance, namely fitting each player independently versus 
assigning players cluster-specific curves. The first competitor is a semi-parametric re¬ 
gression model (henceforth SP) that fits individual curves independently and flexibly. 
The second is a semi-parametric regression model with a Dirichlet process prior (hence¬ 
forth SPDP) which produces individual curves that are cluster specific. More precisely 
we consider 


Hit = x'i/3 + fi(zu) + e it with e it ~ N( 0, of) for i = 1,..., m and t = l,...,n (11) 

where ( fi{zn ),..., fi{zi n ))' = HOj is modeled using subject-specific linear combinations 
of B-spline basis functions, z lt £ [0,1], Xj is a vector of covariates that will be described 
shortly and 

SP SPDP 

G t ~N( O.r 2 .^ 1 ) Oi\G~G 

G ~ DP(M, G 0 ) with G 0 = N(0 , t 2 K~ 1 ). 

Notice that under SP a P-spline prior is used for 0, while under SPDP the base mea¬ 
sure of the DP is a P-spline prior. As with HPPMx, r 2 ~ IG(a Tl b T ). The competi¬ 
tors selected, though reasonable, aren’t capable of providing active player predictions. 
Therefore, prediction assessment in the simulation study is only carried out for career 
prediction. Finally, we investigate the influence that covariates have on clustering by 
considering the HPPMx model with a PPM prior rather than a PPMx prior (which will 
be hence forth referred to as HPPM). 

Since both simulation studies employ the same general data generating mechanism 
we provide details here. A response vector is generated using 

yu = boi T fgroupi ( Z it ) + e it with t it ~ N(0,w 2 ) (12) 

where f gr oupi{') corresponds to groupt = 1 ,..., 6 possible mean curves which were cre¬ 
ated using the NBA data as a guide (see Figure 2). The six mean curves are made 
to depend on covariates by creating two categorical covariates that when crossed pro¬ 
duce six categories, one for each mean curve. A continuous covariate was generated by 
x* ~ N(groupi, 0.1). Since x* depends on the two categorical covariates, an interaction 
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0.0 0.2 0.4 0.6 0.8 1.0 

Z 

Figure 2: The six mean curves used in the simulation study. 


between them is created. The three covariates were included in all model fits. Lastly, 
the random intercept is generated using b oi ~ 1V(10,2). 

The factors we explore in the simulation study with their respective levels are 

• value for hyper parameter A (0.1, 1, 10) 

• number of knots (5,15,30) 

• variance of (12) ( w 2 = 0.1, w 2 = 1) 

• number of observations per subject ( n = 50, n = 100). 

A and the number of knots were selected to investigate how P-splines function as a prior 
at the process level instead of at the observation level of a hierarchical model. With n 
and w 2 we see how the methodology performs as more information becomes available 
relative to noise. For each combination of the factor levels 100 data sets with m = 60 (10 
players per group) are generated and for each data set SP, SPDP, HPPM, and HPPMx 
are fit. 

For all four procedures we set a T = 1, b T = 0.05, a a = b a = 1.0 and v = 1. For PPMx 
and PPM s 2 o = s 2 = 100 2 , and for SP and SPDP /3 ~ N(0, 100 2 /). Finally for HPPMx, 
the cohesion and similarity functions employed are those that match the marginal prior 
on partitions implied by a DP prior (see Section 6.1 for more details). Each of the 
four procedures were fit to each synthetic data set using 1000 MCMC iterates after 
discarding the first 5000 as burn-in. Empirically based starting values were employed 
which accelerated convergence making the 5000 iterate burn-in sufficient. 
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5.1 Out of Sample (Career) Prediction 

To assess out of sample (career) prediction, for each of the 100 generated data sets 
100 additional out of sample subjects were generated. The f gr0 up{-) associated with 
each new subject is known and therefore out of sample prediction can be assessed by 
comparing /(■) from the four procedures to f gr0 up{-)- After centering both f grouP:j {•) 
and /)■(•) (i.e. subtract off the empirical mean) for the j th out of sample subject, we 
measure prediction accuracy using the mean integrated squared prediction error 

MISPE, = E / Ifjiz) - f grouPi ( z)fdz « Y, A tE[f(z jt ) - f g rou P {z Jt )} 2 (13) 

where A t = (zjt +i — Zjt)■ Equation (13) essentially measures the average squared area 
between f groupj (•) and fj(-) for the j th out of sample player over z’s domain. The values 
in Table 2 correspond to 

D ^ 100 

04) 

d= 1 1=1 

where d indexes the D = 100 generated data sets. 

From Table 2 we see that HPPM and SPDP provide similar predictions which is to 
be expected as both employ a DP prior (although not at the same level of a hierarchy). 
What should be very obvious is that HPPMx does a much better job in out of sample 
prediction relative to the other three procedures for all data generating scenarios. 


5.2 Goodness of Individual Fits, Curve Smoothness, and Clustering 


To assess goodness of individual fits we employ the following R 2 type goodness-of-fit 
statistic from Ramsay and Silverman (2005): 


- yu) 2 

E t(yn - yi) 2 


(15) 


R 2 can be loosely interpreted as a coefficient of determination in that as R 2 approaches 
1, individual fits improve. Negative values of R 2 indicate that tji predicts better than 
/»(•). The values in Table 3 correspond to 


1 

D 


D 1 m 


m 

d— 1 i=1 


(16) 


From Table 3 we see that SP tends to produce the best individual fits and SPDP 
the worst. This is of course expected as all individuals are fit independently by SP 
while SPDP provides cluster specific curves. However, HPPMx does remarkably well 
in producing good individual fits as HPPMx is very close to SP particularly as n in¬ 
creases. Thus, HPPMx’s meaningful cluster production doesn’t require sacrificing much 
goodness of individual fit. 
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Table 2: Results from the simulation study investigating out of sample prediction. Table 
entries are calculated using (14) with m = 60 players. 

n = 50 n = 100 


Number of knots Number of knots 




Model 

5 

15 

30 

5 

15 

30 



HPPMx 

0.131 

0.119 

0.123 

0.149 

0.144 

0.122 

A 

0.1 

HPPM 

0.891 

0.868 

0.847 

0.873 

0.855 

0.848 


SP 

1.310 

1.301 

1.319 

1.297 

1.288 

1.277 



SPDP 

0.782 

0.784 

0.789 

0.780 

0.779 

0.775 



HPPMx 

0.034 

0.026 

0.095 

0.026 

0.025 

0.029 

A 

1 

HPPM 

0.791 

0.795 

0.799 

0.779 

0.784 

0.777 


SP 

1.324 

1.319 

1.306 

1.285 

1.300 

1.279 



SPDP 

0.785 

0.789 

0.782 

0.779 

0.782 

0.776 



HPPMx 

0.022 

0.025 

0.108 

0.023 

0.041 

0.037 

A = 

10 

HPPM 

0.786 

0.791 

0.800 

0.774 

0.776 

0.784 

SP 

1.321 

1.307 

1.292 

1.282 

1.278 

1.289 



SPDP 

0.785 

0.783 

0.779 

0.775 

0.777 

0.780 



HPMMx 

0.155 

0.203 

0.254 

0.142 

0.151 

0.242 

A 

0.1 

HPMM 

0.882 

0.848 

0.852 

0.854 

0.828 

0.837 


SP 

1.315 

1.312 

1.326 

1.277 

1.287 

1.283 



SPDP 

0.783 

0.786 

0.787 

0.771 

0.776 

0.777 



HPPMx 

0.100 

0.102 

0.170 

0.045 

0.069 

0.102 

A = 

1 

HPPM 

0.796 

0.807 

0.829 

0.778 

0.789 

0.795 

SP 

1.312 

1.316 

1.306 

1.257 

1.279 

1.288 



SPDP 

0.783 

0.785 

0.785 

0.766 

0.776 

0.776 



HPPMx 

0.080 

0.095 

0.157 

0.047 

0.066 

0.113 

A = 

10 

HPPM 

0.823 

0.817 

0.836 

0.798 

0.794 

0.809 

SP 

1.331 

1.305 

1.327 

1.291 

1.269 

1.284 



SPDP 

0.789 

0.780 

0.787 

0.778 

0.772 

0.776 


To assess curve smoothness we calculate the standard deviation of the lag one dif¬ 
ferences from the estimated curve 


ISDi = 


N 


1 

n — 3 


n— 1 

5>*t - w*) 2 , 

t =1 


(17) 


where lag it = fi{z it+ 1 ) - ) for t = 1 , ..., n - 1 and = l/(n - 1 ) la 9it- 

Large values of £SDi generally indicate more wiggliness relative to small values. Values 
provided in Table 4 correspond to 


1 

D 


D 


m ' 

d—1 i—l 


(18) 


From Table 4 it appears that curves become less wiggly as n increases relative to 
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Table 3: Results from the simulation study investigating goodness-of-fit. Table entries 
are calculated using (16) with m = 60 players. 

n = 50 n = 100 


Number of knots Number of knots 




Model 

5 

15 

30 

5 

15 

30 



HPPMx 

0.979 

0.990 

0.950 

0.981 

0.989 

0.984 

A = 

0.1 

HPPM 

0.813 

0.908 

0.919 

0.812 

0.939 

0.952 

SP 

0.985 

0.993 

0.994 

0.984 

0.992 

0.992 



SPDP 

0.802 

0.794 

0.774 

0.764 

0.759 

0.743 



HPPMx 

0.981 

0.965 

0.864 

0.984 

0.991 

0.979 

A 

1 

HPPM 

0.983 

0.987 

0.950 

0.984 

0.991 

0.989 


SP 

0.985 

0.993 

0.994 

0.984 

0.992 

0.992 



SPDP 

0.808 

0.811 

0.798 

0.771 

0.743 

0.746 



HPPMx 

0.984 

0.975 

0.853 

0.984 

0.990 

0.979 

A = 

10 

HPPM 

0.985 

0.987 

0.937 

0.984 

0.991 

0.992 

SP 

0.985 

0.993 

0.994 

0.984 

0.992 

0.992 



SPDP 

0.802 

0.793 

0.787 

0.765 

0.747 

0.750 



HPPMx 

0.583 

0.593 

0.572 

0.574 

0.593 

0.575 

A 

0.1 

HPPM 

0.485 

0.537 

0.497 

0.476 

0.552 

0.541 


SP 

0.613 

0.663 

0.713 

0.570 

0.575 

0.533 



SPDP 

0.538 

0.554 

0.565 

0.515 

0.521 

0.520 



HPPMx 

0.585 

0.608 

0.630 

0.572 

0.589 

0.605 

A 

1 

HPPM 

0.589 

0.618 

0.663 

0.574 

0.596 

0.616 

JA — 

SP 

0.613 

0.662 

0.711 

0.566 

0.575 

0.531 



SPDP 

0.541 

0.555 

0.562 

0.511 

0.522 

0.523 



HPPMx 

0.587 

0.610 

0.626 

0.574 

0.589 

0.607 

A = 

10 

HPPM 

0.593 

0.624 

0.667 

0.577 

0.597 

0.617 

SP 

0.613 

0.662 

0.712 

0.568 

0.574 

0.535 



SPDP 

0.539 

0.554 

0.561 

0.514 

0.519 

0.522 


the number of knots. This is expected. Also unsurprising is that HPPMx and HPPM 
produce similar values of (18) with the biggest differences occurring when w 2 (within 
player variability) and A (within cluster variability) are small. What is a bit surprising 
is that the value of A doesn’t much alter curve smoothness for HPPMx. It appears that 
w 2 is more influential. Overall, since the values of (18) for HPPMx are fairly similar to 
those for SP and SPDP (recall that SP and SPDP are not influenced by A), penalizing 
curves directly with a P-spline prior produces curves with similar smoothness as those 
produced through the hierarchical model. 

To see how the PPMx prior improves clustering relative to the PPM prior, Table 5 
provides the number of estimated clusters (k m ) averaged over all D = 100 data sets. 
For each data set p was estimated using Dahl (2006) ’s method which is based on least- 
squares distance from the matrix of posterior pairwise co-clustering probabilities (note 
that an estimate of p also provides an estimate of k m ). 
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Table 4: Results from the simulation study investigating smoothness. Table entries are 
calculated using (18) with m — 60 players. 

n = 50 n = 100 


Number of knots Number of knots 




Model 

5 

15 

30 

5 

15 

30 



HPPMx 

0.162 

0.165 

0.171 

0.082 

0.082 

0.087 

A = 

0.1 

HPPM 

0.143 

0.154 

0.165 

0.073 

0.078 

0.084 

SP 

0.162 

0.162 

0.166 

0.081 

0.080 

0.083 



SPDP 

0.154 

0.153 

0.157 

0.077 

0.076 

0.079 



HPPMx 

0.163 

0.165 

0.170 

0.081 

0.081 

0.084 

A 

1 

HPPM 

0.163 

0.164 

0.168 

0.081 

0.081 

0.085 


SP 

0.162 

0.162 

0.166 

0.081 

0.080 

0.083 



SPDP 

0.154 

0.154 

0.158 

0.077 

0.076 

0.079 



HPPMx 

0.163 

0.165 

0.169 

0.082 

0.081 

0.085 

A = 

10 

HPPM 

0.162 

0.164 

0.168 

0.082 

0.081 

0.086 

SP 

0.161 

0.162 

0.166 

0.081 

0.080 

0.083 



SPDP 

0.153 

0.154 

0.158 

0.077 

0.076 

0.079 



HPPMx 

0.179 

0.208 

0.274 

0.096 

0.107 

0.132 

A 

0.1 

HPPM 

0.155 

0.182 

0.250 

0.082 

0.096 

0.120 


SP 

0.195 

0.206 

0.258 

0.088 

0.071 

0.062 



SPDP 

0.172 

0.174 

0.191 

0.095 

0.090 

0.095 



HPPMx 

0.184 

0.210 

0.288 

0.097 

0.109 

0.136 

A 

1 

HPPM 

0.183 

0.209 

0.298 

0.097 

0.108 

0.140 

JA — 

SP 

0.198 

0.205 

0.258 

0.088 

0.071 

0.061 



SPDP 

0.174 

0.175 

0.190 

0.094 

0.090 

0.095 



HPPMx 

0.184 

0.212 

0.286 

0.097 

0.110 

0.137 

A = 

10 

HPPM 

0.185 

0.210 

0.296 

0.100 

0.108 

0.141 

SP 

0.197 

0.206 

0.258 

0.088 

0.071 

0.062 



SPDP 

0.173 

0.175 

0.189 

0.094 

0.090 

0.095 


The true value of k m in Table 5 is six for all scenarios. It appears that as n increases, 
the PPMx prior tends to converge to the six clusters faster than the PPM prior. It also 
appears that the clustering mechanisms of the HPPMx and HPPM depend on A. This 
is to be expected however, because as A increases curves are allowed to deviate further 
from cluster specific means, thus creating more clusters. 


6 Analysis and Results 

In this we section provide results of fitting HPPMx to the NBA data set. 

6.1 Model Details and Prior Selection 

We first provide a bit of detail regarding cohesion and similarity functions used and then 
on prior values. The cohesion and similarity functions employed match the PPMx prior 
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Table 5: Results from the simulation study investigating cluster estimation. Table entries 
correspond to the number of estimated clusters averaged over 100 simulated data sets. 







n = 50 



n = 100 






Number of knots 

Number of knots 




Model 

5 

15 

30 

5 

15 

30 


A 

0.1 

HPPMx 

5.98 

5.98 

5.99 

5.97 

5.98 

5.96 



HPPM 

3.93 

4.26 

4.49 

4.32 

4.37 

4.40 

w 2 = 0.1 

A 

1 

HPPMx 

9.24 

8.53 

6.52 

10.93 

10.26 

9.02 


HPPM 

9.49 

9.18 

7.89 

10.85 

10.77 

10.02 


A 

10 

HPPMx 

10.32 

8.55 

6.42 

13.39 

10.34 

8.86 


Ji — 

HPPM 

10.45 

8.31 

7.14 

11.84 

9.67 

8.94 


A = 

0.1 

HPPMx 

5.97 

5.96 

5.99 

5.98 

5.97 

5.99 


HPPM 

4.09 

4.36 

4.72 

4.02 

4.54 

4.66 

w 2 = 1 

A = 

1 

HPPMx 

7.25 

6.87 

6.11 

8.20 

7.94 

6.82 

HPPM 

7.08 

6.74 

6.15 

7.94 

7.77 

7.40 


A = 

10 

HPPMx 

7.24 

6.89 

6.28 

8.90 

8.06 

7.08 


HPPM 

5.84 

6.03 

5.36 

7.14 

6.88 

6.94 


to the marginal prior on partitions implied by the DP prior. This results in an a priori 
clustering of a few large clusters that represent typical player production and a few 
smaller clusters of “abnormal” players. Thus, we set c(Sj) = M(\Sj \ — 1)! with M = 1 
favoring a small number of clusters. The similarity functions used are typical conjugate 
models for continuous and categorical variables with parameter values suggested by 
Muller et al. (2011) resulting in 


g(x*) = gi(x* 1 )g 2 (x* 2 )g 3 {x* 3 ) 

= f JJ N(xa-,mj, l)N(mj;0,10)^i,x i2 

J ieSj 

x Dir(-Ki tXi2 ; 0 . 1 , 0 . 1 , 0.1)Tr iiXi3 Dir(ni !Xi3 ; 0 . 1 , 0 . 1 , 0.1 )dmjd'ir lj dn 2 j 
N nj (xf, 0 , 1)N( 0; 0,10) TgLi n ljc + 0.1) T(ELi » 2 jc + 0-1) 

N(m- 0, s 2 ) nLi nnijc + 0.1) U.U r (^c + o.l)' 

7 Tj >Xi2 and 7 Ti tXi3 denote xa and X& s probability vector, nij c are the number of players 
in cluster j that have covariate value c for Xi 2 and n 2 j C the number of players for xt 3 
(as a reminder, xn corresponds to age, Xi 2 experience and Xi 3 draft order). In addition, 
s 2 = (rij + 1/10 ) -1 and m = S 2 

A first-order Bayesian P-spline prior was used and following suggestions in Lang and 
Brezger (2004), we set a = 1 and b = 0.05. We found that results were fairly robust to 
reasonable prior specifications of t 2 . From the simulation study setting A = 1 seemed 
reasonable so that individual curves are fairly similar to their cluster-specific counter¬ 
parts. Preliminary investigations indicated that methodology is robust to variance prior 
specifications so with hopes of being diffuse we set a CT = b a = as = b$ = = 1 

and s 2 o = 100 2 . To produce a flat prior for 7 we use m 1 = 0 and s 2 = 100 2 . Since there 
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are 82 games in an NBA season we set m ai =76 (taking into account missed games 
due to injury) with s 2 = 10 2 . 

The MCMC algorithm was run until 1000 iterates from a Markov chain were col¬ 
lected after discarding the first 25,000 as burn in and thinning by 25. Convergence was 
monitored using MCMC iterate trace plots. 


6.2 Fits of Individual Player Production Curves 

Figure 3 displays the posterior mean curves with 95% credible and prediction bands 
for the three players introduced in Figure 1. Notice that even though the fits are fairly 
flexible they are smoothed relative to the loess fits provided in Figure 1. 



Figure 3: Posterior fits for three NBA players. The solid red lines are point-wise posterior 
mean curves, the dashed red lines are 95% mean point-wise credible intervals, and the 
dashed orange lines are 95% point-wise prediction intervals. 


6.3 Active Player Prediction 

Displaying the results of active player prediction in and of itself is not trivial as curves 
depend completely on the predicted values of rii. To simplify the process we display 
the active player prediction curves conditioned on E(m\yi). This requires producing 
a curve conditioned on E(rii\yi) for each MCMC iterate of (3. From these MCMC 
iterates, we estimate an average prediction curve with point-wise 95% credible bands 
and prediction bands. Figure 4 contains the estimated mean curve with credible bands 
and prediction bands corresponding to four players in varying stages of their career. 
Shaquille O’Neal played beyond the 2009/2010 season but has since retired. His average 
number of predicted games played turned out to be 1545 and the actual number of 
games played is 1423 (including playoffs). Ray Allen continues to play but is nearing 
the end of his career and the predicted sharp decrease in production mirrors reality. 
Dwight Howard and Luke Ridnour are two completely different types of players and 
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number ol games 


Figure 4: Active player predictions for four NBA players. Lines associated with active 
player prediction are blue. The dashed lines represent point-wise 95% mean credible 
bands, and the dotted lines 95% point-wise prediction bands. 

are provided as a means to demonstrate the flexibility in the predictions. E{rii\yi) 
for Dwight Howard is quite conservative and barring injury should under estimate his 
career game total, while E(m\yi) for Beno Udrih is quite reasonable. Regardless, the 
four predictions display completely plausible decreases in production as the players 
approach retirement. 


6.4 Career Prediction Analysis 

For career prediction we employ the predictive distributions as described in Section 4.2 
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Top 5 pick 1st Round 2nd Round 
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Predicted Number of Games Played 

— - High School, 19 years • • • College, 19 years • • * College, 21 years College, 23 years 

•— International, 19 years • — International, 21 years — International. 23 years 


Figure 5: Career prediction curves for different levels of draft order, playing experience 
and age during first game played. 


resulting in the curves found in Figure 5. We include the High School level of experience 
even though the latest CBA requires at least one year post high school experience 
before being drafted. For age during first game played, we considered 19, 21, and 23 
years old. (We do not consider ages 21 and 23 for High School level of experience as 
those scenarios are practically impossible.) The curves are presented conditioned on the 
predicted number of games played averaged over all active players that belong to each 
respective group. 

Before describing results it is important to keep in mind that only players who 
played at least three years were included in the analysis. This explains the seemingly 
high predictions for second round picks. Also, from Table 1 it can be seen that only 
one player (considered in the analysis) was drafted in the second round straight from 
high school (Rashard Lewis). So you will notice that the predicted curve for this group 
follows a trajectory similar to that of Rashard Lewis’s career. Even with that being the 
case, a few interesting trends emerge. It appears that there is much more variability in 
curve location for Top Five Picks. Also the players that are Top Five Picks tend to reach 
their max production earlier in their career. Age clearly influences a player’s production 
as players that start their career at a younger age tend to have higher production 
rates. It appears that the shapes of curves vary by experience with international players 
decreasing slightly earlier relative to college or high school players. 

Table 6 provides estimates of the number of games need to reach peak performance. 
Generally speaking players drafted straight out of high school take longer to reach 
maximum performance while those with college experience are the quickest. That said, 
any conclusions drawn from Table 6 or Figure 5 should be made with care as some of 
the curves are accompanied with a moderate to substantial amount of variability. 
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Table 6: Predicted game at which max performance is attained (prediction errors are 
in parenthesis). 


Age During First Game Played 


Draft 

Experience 

19 

21 

23 

Top 5 

High School 

College 

International 

477(169.8) 

385(162.2) 

399(155.6) 

437(184.9) 

448(160.7) 

386(178.9) 

423(147.1) 

1st Round 

High School 

College 

International 

472(147.4) 

436(185.5) 

446(151.3) 

417(193.5) 

473(172.4) 

403(200.2) 

413(167.3) 

2nd Round 

High School 

College 

International 

568(184.7) 

344(152.6) 

409(144.6) 

386(179.1) 

416(151.1) 

369(179.7) 

419(169.6) 


Cluster Analysis 

A nice property of the model is the ability to postulate what characteristics guide 
clustering. To do this it is necessary to obtain a point estimate using cluster MCMC 
iterates. Since posterior summaries of cluster specific parameters are arbitrary, using 
typical posterior summaries (mean and median) makes little sense. We employ Dahl 
(2006) ’s method which is based on least-squares distance from the matrix of posterior 
pairwise co-clustering probabilities. Using this method produces a partitioning of the 
408 players into 18 clusters with cluster membership ranging from 3 to 63 players. 
Figure 6 provides player-specific posterior mean curves for nine clusters. Except for 
cluster 6, these clusters represent those that contain the highest number of players 
(approximately 80% of players). Cluster 6 was selected as it contains curves that are in 
our opinion more interesting than clusters not shown. The remaining nine clusters for 
the most part are comprised of role players. Although each cluster contains curves that 
are slightly unique, they are relatively flat. The dashed segments at the end of some 
curves are active player predictions. To facilitate comparisons we maintain the x-axis 
on the percentile number of games played scale. We highlight a few of the clusters by 
pointing out some of the well known players. Cluster f is comprised of role players (e.g., 
Tony Allen and Matt Bonner) whose production is constant. Cluster 2’s key member is 
LeBron James. Players in this cluster begin careers close to peak level and appear to 
maintain production. Cluster 3 contains Carlos Boozer and Ron Artest who had sharp 
increase in production but maintained peak performance for a short time with a gradual 
decrease in performance. Cluster 4 is comprised of role players who showed an increase 
in production right before retiring (e.g. James Posey). Kobe Bryant is the key player 
of Cluster 5 (along with Chauncey Billups, Steve Nash). In this cluster, players started 
slow but experienced large sharp increases of production and maintained it for much of 
their career. Clusters 6 and 7 are comprised of players who start at peak performance 
and gradually decline through out their career with players in Cluster 6 showing a 
brief increase towards the end of their career. Grant Hill is a member of Cluster 6 and 
Shaquille O’Neal is a member of Cluster 7. Clusters 8 and 9 are primarily comprised of 
role players with Cluster 8 showing gradual increase until the later stages of a career. 
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Figure 6: Player specific posterior mean production curves divided into the 9 clusters 
corresponding to the partition estimated using Dahl (2006) ’s cluster estimate method. 
Active player prediction for active players is displayed by a dashed line. 


An example is Matt Barnes. Marcus Camby is member of Cluster 9 and the decrease in 
production towards the end of the career is more sharp relative to Cluster 8. Overall, 
the clusters contain curves that have distinct shapes. Information provided in Figure 6 
could potentially be used to guide NBA personnel decisions regarding contract length 
and amount. For example, production for players in 3 begins to decrease earlier than 
Cluster 2. 
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Finally, Figure 7 displays the average age during first game played vs. the percent of 
players in each of the six categories for each of the 18 clusters. Apart from demonstrating 
the presence of an interaction between the three covariates, these plots confirm what is 
already widely known. That is, on average, the age of players increases as draft order 
increases and players that play college tend to begin NBA careers at an older age relative 
to high school and international players. 

6.5 Assessing Trade-off between Individual Player Fits and 
Prediction 

As mentioned previously, incorporating the PPMx in the hierarchical model improves 
predictions at the cost of a small loss in individual fits. To show that the cost is minimal 
relative to gains in prediction, we randomly selected 50 retired players and removed the 
final 25% of games played (essentially treating them as active players). We then pro¬ 
ceeded to fit four models (details follow) to these partitioned data and assess model fit 
through Mean Square Error (MSE). To assess prediction accuracy, active player pre¬ 
diction was carried out for each of the 50 randomly selected players and Mean Squared 
Prediction Error (MSPE) was computed. The four models considered were the SP model 
(12), an extension of the SP model that improves prediction, the HPPMx, and the fol¬ 
lowing 5th degree polynomial regression model: 

5 

y it = x[/3 + '^2^ ji t 3 + e it with e it N(0, of) and (3 ~ N(0,s 2 I) 

3=0 

7 i ~ N(fi, T ) where T = diag(r 0 2 ,..., r|) 
li ~ N(0 ,s 2 I). 

The SP model (12) was extended in the following way 

Oi ~ N^yx- 1 ) 

~ N(0,s 2 I). 

We refer to this model as hSP (hierarchical semi-parametric). Predicting (or extrapo¬ 
lating) the last 25% of games played using the SP model requires drawing 9 's associated 
with knots for removed games from its prior distribution. Therefore, centering the prior 
on a vector of global spline coefficients should improve prediction relative to a prior 
centered at 0. 

The MSE averaged over the 408 players was calculated for each of the four models 
resulting in Polynomial (30.72), SP (29.66), hSP (29.71), HPPMx (31.71). As expected 
the flexible penalized splines produce the smallest MSE and HPPMx has the highest 
MSE illustrating the surrender of a bit of individual player fit. The MSPE averaged 
over the 50 randomly selected players turned out to be Polynomial (476.32), SP (34.00), 
hSP (31.10), and HPPMx (26.90). As expected HPPMx gained quite a bit in terms of 
prediction (extrapolation) accuracy at a fairly minimal individual fit cost. 
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Figure 7: Average baseline age and percent of players for each of the 18 clusters for 
the six categories corresponding to experience and draft order. The size of the dot is 
proportional to the number of players allocated to the cluster. 
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7 Conclusions 

We have proposed a completely novel methodology that incorporates information re¬ 
garding the shape of longitudinal curves in predicting future NBA player production 
based on age, experience, and draft order of player. Clearly, curve shape provides in¬ 
formation beyond available covariates and the inclusion of this information in modeling 
efforts should improve inferences. In addition, the methodology does well in balanc¬ 
ing individual fits and producing clusters that provide adequate borrowing of strength 
among players. The PPMx prior employed does a nice job of being able to incorporate 
both covariate and curve shape information when forming clusters and ultimately bor¬ 
rowing strength among subjects to improve active player and career predictions. From 
a basketball perspective, individual player production clearly depends on many omitted 
variables (such as team strength, injury history and coaching philosophy) and these vari¬ 
ables can be easily incorporated in the model using the PPMx prior when they become 
available. Finally, though the methodology was demonstrated using production curves 
of NBA basketball players, the idea of incorporating curve shape in inferences should 
be applicable in a wide variety of settings (e.g., biomedical, finance, and environmental 
studies). 
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Appendix 

1 John Hollinger’s Game Score 

The Hollinger game score is the following linear combination of statistics appearing in 
a typical box-score summary of each players game statistics: 

• PTS = total points scored by player in game 

• FGM = number of shots that player made in game 

• FGA = number of shots that player attempted in game 

• FTM = number of free throws made in game 

• FTA = number of free throw attempts in game 

• OREB = number of offensive rebounds 

• DREB = number of defensive rebounds 

• STL = number of steals 

• AST = number of assists recorded 

• BLK = number of blocked shots recorded 

• TO = number of turn overs 

• PF = personal fouls. 

Game Score = PTS + FGM x 0.4 - FGA x 0.7 - {FTA - FTM) x 0.4 + OREB 
x 0.7 + DREB x 0.3 + STL + AST x 0.7 + BLK x 0.7 — PF x 0.4 
-TO. 


2 Full Conditionals 


We list the full conditionals used in the Gibbs sampler. In what follows we use [0|—] 
to denote the distribution of 6 conditioned on all other parameters and data and rih 
denotes the number of subjects belonging to cluster h. Also for notational convience, 
Hi denotes the B-spline basis design matrix for both gi — 1 and g\ = 0. 
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